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Abstract 



This article discusses the research potential of some of the National Center for 
Educational Statistics' data sets, specifically those focused on junior and senior high school 
students. All share some characteristics, but it is the most recent (National Education 
I.ongitudinal Studv : I^XX) which is the most comprehensive Since data is gathered not only from 
the students themselves, but parents, teachers, and school administrators, researchers can begin to 
put the educational process "‘in context" and include extra classroom factors in their analyses. 
Data is available on CD-ROMs, a format which has advantages as well as limitations Given the 
fact that NELS:XX began when the students were in the Xth grade, it is possible to study the 
gender gap in mathematics and science. Missing data, internal inconsistencies, and lack of 
school contextual data are problems. In addition, as presently constmeted, these data sets cannot 
be used to study students attending schools in the very large cities. 
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Large Data Sots: Opportunities and Challenges for Educational Researchers 



Since ld72, the National Center for Education Statistics (NCES) has collected 
information on the achievements, behaviors, and attitudes of students in public and pn\ate 
schools across the nation and on the attitudes and behaviors of their parents, teachers, and 
administrators These data sets have grown larger, more elaborate, and more inclusue with each 
new issuance. At present, at least four major collections offer educational scholars and poliev 
analysts nationally representative, longitudinal data on adolescents that have yet to be c.\haiisted. 

In what follows, wc consider both the rewards and fnistrations of working with large 
data sets in general and the National Education Longitudinal Study begun in 1^)88 (NELS.88) m 
particular. Throughout, wc use our w'ork at the Center for Research in Human Development and 
Education (CRHDE) at Temple Universit>' in Philadelphia. Thus, though we use NELS 88 to 
illuminate the pitfalls of dealing with missing data and inconsistent responses, these phenomena 
have applications for working with all large data sets. On the other hand, a specific and primary 
strength of the NELS:88 data set is that it affords a unique opportunity to study the transition 
from middle to high school. We have harnessed this aspect of NELS; 88 to study the gender gap 
in math, but there are many research areas that might be.nefit from this opportunity Interestingly, 
even as it enables one research program, NELS 88 curtails another. Because NELS 88 embodies 
serious errors in the way urban has been defined, the data as presently constnicted cannot be 
used to adequateh address issues surrounding mner-city education-as wc have disco\ercd in our 
research on "students at-nsk " 
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NCES Data Sets 



Housed in the U S. Department of Education, the National Center for Education 
Statistics (NCES) is responsible for collecting statistics on the condition of education in the 
nation. In this capacity, a number of databases are maintained and analyzed Tlie major cross- 
sectional clementary/secondary school level databases are: the Common Core of Data (CCD) , the 
Schools and Staffinu Survey , and the Private School Survey At the postsecondar\ level, the 
available cross-sectional data bases arc the Intetrrated Postsecondar\‘ Education D at a S\stem . 
the National Postsccondar\‘ Student Aid Study , the Recent Colleue Graduate Studs . the Natio nal 
Survey of Postsecondarv^ Faculty , and the Survey of Earned Doctorates Awarded in the United 
States . The National Household Education Survey (NHES) includes data on pre-primary, 
primary, and adult literacy patterns. Some of these data sets allow comparisons over time (e g., 
CCD) and some do not (NHES). 

For the past three decades, NCES has conducted ongoing studies of adolescents 
nationwide. The first of these, the National Assessment of Educational Progress (NAEP), was 
begun in 1969 and has been repeated at one- and two-year intervals since. NAEP is composed of 
9-, 13-, and 17-year-old students (i.e., those in grades 4, 8, and 12) of various racial backgrounds 
in public and private schools and is chiefly concerned with documenting curriculum and 
performance changes. Approximately 40,000 students per grade level (146,000 total) arc sampled 
in any given year, along with their teachers and school administrators. This is not a panel study in 
that the same student are not included in survey after survey. 

Three major student-focused longitudinal databases follow and am concurrently with 
NAEP the National Lonmtudinal Study of the Hmh School Class of \^>12 (NLS 72). Hmh 
School and Bevond ( HS&B). and NELS 88 Each successive project both adds new elements and 
includes some from its forcmnncr(s) to facilitate sur\c\-to-surve\ comparisons With each new 
database, students were surveyed at vounger and >ounger ages Thus. NLS 72 included onK 
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seniors, whereas HS&B added sophomores and NELS:88 began with eighth graders (NCES's 
next project will start when the students are in first grade). 

NLS 72 is a national probabilitv sample of 19,001 students designed to be 
representative of the nation's approximately three million high school seniors in more than 1 7.000 
schools. This large data set follows that group of \oung people, at intcr\als, through the 1 5 
important transitional years following high school (to 1086). The NLS:72 survey includes both 
achievement measures (standardized scores, transcript information, and grade point averages) 
and demographic data. (For more information, sec ingcis, Karr, Spencer, & Franekel, |OO0) 

"The goal of the second large project, HS&B, w'as to inform Federal and State polie\ in the 
1980s." (Sebring, Campbell, Glusbcrg, Spencer, & Singleton, 1987, p. 2). Begun in the spring of 
1080, HS&B represented a departure from previous NCES programs in that the initial sur\'cy 
included both sophomore and senior cohorts. The 58,000 students (30,000 seniors and 28,000 
sophomores) have been periodically resurveyed. HS&B data include performance measures (test 
scores) as well as attitudinal, background, school activities, and work data for students and 
attitudinal and background information for their parents, teachers, and school administrators. 

Recently, NCES has initiated to new longitudinal surveys. Beuinning Postsecondarv 
Student Longitudinal Study is the first study begining with students first entering postsecondary 
education. Data for the Baccalaureate and Bevond survey was first collected in 1993. TTiis data 
set focuses on a cohort of students who arc near gi aduation from college are are about to enter 
the work force or graduate education. Presently, NCES is planning yet another national 
longitudinal survey. This one will begin with very young children Data collection is to being in 
IW7 
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The National Education Longitudinal Study (NELS:8H) 



The most recent NCES program. NELS 88 was initiated in 1988 with approximatoK 
25.000 eighth-grade students. The stated goals of this third large-scale project arc more 
comprehensive than those of any education longitudinal study to date. "A central theme is tliat 
education in .America must be understood as a lifelong process enmeshed in a complex social 
context This study is also intended to produce a comprehensive data set for the development and 
evaluation of educational policy at all governmental levels" (Ingels et al., 19^0, p 5) The initial 
sample of eighth graders is re-surveyed as high school sophomores ( 1992: the First FoIIow-Up 
study) and seniors (1994: Second FoIIow-Up). 

Like HS&B, NELS:88 employs a two-stage sampling design. Schools were selected fir^t 
and then a random selection of students was made from those chosen schools. Groups 
oversampled were public schools with high enrollments of Asian and/or Hispanic students. 
Catholic schools with high minority enrollments, alternative schools, and private schools with 
high-achieving students. This strategy' ensured large enough numbers of Asian and Hispanic 
students to enable statistical analyses. In eases of sample mortality, samples were freshened with 
dcmographically similar students from already selected schools. Elaborate weighting factors must 
be used to compensate for these design effects. "Panel flags" allow for selecting specific 
populations for whom particular data arc available; for example, flagging only the students for 
whom teacher information is also available, or who participated in all three (or two) surveys. 

Over 1 7. 000 sUidcnts participated m both the Base Year and First Follow-Up surveys, and 
approximate! V 16.000 took part m all three questionnaires 

The student surveys included information about the sUidents’ socioeconomic status (SES). 
perceptions of self. sch(X)l life, family, and educational experiences, and career aspirations Also 
available is an elaborate set of variables based on students* scores on a battery of cognitive tests 
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developed by the Educational Testing Service, and transcript files that document students' course- 
taking behaviors and other achievements (e.g., awards, grades). Although students who have been 
identified as “dropouts’* participate in the general student survey, NELS;88 also maintains 
separate data files for these students. The focus in these files is on why the students left school 
before graduation and what they are doing instead of going to school. The parent surve> . also 
initiated in the 1^88 base year, includes information about the nature and extent of parental 
support for their children's educational activities as well as standard demographic data Student 
files are supplemented in both follow-ups, and parent files in I ^^2, with responses from the 
newly added (“freshened”) participants. 

The teacher survey contains information from two of the student's teachers related to the 
teachers' background, instructional practices, attitudes, and perceptions of their students* 
performances. Finally, the school survey, completed by the administrators of the schools from 
which students were amplcd, offers information about school resources, programs, policies, and 
some demographic characteristics. The school data, collected in each survey year, contain slightly 
different information from year to year. In the First Follow-Up file, for example, numerous 
jiinior/senior high transition variables w^ere included. In the Second Follow-Up, the focus was on 
schv'X)l-to-work transitions. 

NCES constructed ^vo types of data files for NELS:88. In the public use files, some 
variables were suppressed or categorized (e g., school enrollment, percentage of students 
receiving free or reduced cost lunches, state where the school is located) to preserve 
confidentiality The restricted use files contain almost all of the information NCES collected 
Restricted data are available to educational researchers who apply for and meet NCES licensing 
reqxurements Consult the NCES publication. Field Restricted Use Data Proc edures Manual 
(June l^^^k>) 
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The NELS:88 Data: Technological Advantages and New Research Opportunities 

NELS:88 is especially uscftil for considering largo data sets in general, and not simply 
because it is the newest and thus far most comprehensive of the NCES projects. From support to 
documentation, NELS:8X represents a number of leading edge advantages in data anaKsis 
applications— for example, making the data sets available on CD-ROM, an advantage we discuss 
in greater detail below. Moreover, beyond technology, N ELS: 88 is inspirational Because it is so 
inclusive and because it is the first NCES data set to follow students from middle to high school, 
NELS:88 opens new vistas for educational research, and pumps life into existing research 
programs. 

As for using NELS:88, you will find that technical support from NCES is e.xccllcnt, as is 
tlie early documentation: books for each sample (students, parents, etc.) reproduce the 
questionnaire, list the coding scheme, and detail the procedures of using weights, flags, scales, 
and test scores (Ingels ct al., 19^0). This format is both handier and easier to use than that of 
HS&B, where huge all-inclusive books were published intcrmittcritly to cover several years' 
worth of surveys (Sebring et al ., 1987). Unfortunately, the NELS:88 Second Follow-Up did not 
come with the same documentation as the earlier two phases of the project. According to NCES, 
printing costs are becoming prohibitive. There are, however, a number of possible solutions to 
this dilemma. NCES could sell the documentation or produce it on CD-ROM, computer disk, or 
even distribute it over the Internet through File Transfer Protocols (FTP), giving researchers the 
option of purchasing, downloading, and printing out their own copies A final technological 
innovation, that we consider in more detail below, is the availability of the data sets on CD- 
ROM 
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Beyond the concerns of day-to-day applications, NELS:8X is a comprehensive and 
inclusive set of instalments, offering data of interest to scholars with a variet\' of research 
agendas. For example, students arc asked not only about their school and family, but also about 
violence and dnigs. Teachers describe classroom routines as well as providing evaluative 
information about their students. Administrators completed this questionnaire on the basis of the 
entire school— not just those students included in the NELS:88 sample-and so the School set is 
particularly useftil in ferreting out "school effects" (See Gamoran, 1^87; Stull. Rigsbv. & Mor^e- 
Kelly, 1^95b). For those who can obtain an NCES license, NELS.88 data are fuaher enhanced 
by the suppressed variables that, while continuing to protect the anonymit> of the individual 
participants, make highly specific analysis possible: for example, state-by-state comparisons. 

Perhaps most important from the standpoint of research design is that, unlike previous 
data sets. NELS:88 begins with eighth grade. This allows observation of a critical period, the 
transition from junior to senior high school. For researchers interested in dropout prevention, for 
example, it is possible for the first time to identify the "early dropouts." those students who drop 
out by sophomore year of high school. Among students who participated in both Base Year and 
First Follow-Up surveys, 856 students left school (some more than one time) between eighth 
grade and sophomore year. In the second Follow-Up, 1,796 dropouts participated in the survey 
Attention to these subset populations can yield critical information to help better identify students 
at risk of never graduating. Another way in which transition data can be invaluable is in relation 
to the gender gap in math, a continuing area of research at CRUDE that we discuss at length 
below. 

NELS 88 App licatio ns The Power of CD-ROM Technoloev 

PresenlK. NELS 88 offers the public use data on CD-ROM The CDs nre U^-c .md du.’ 
software used to extract the variables is straightforward Using CD-ROM technologv 
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researchers arc not dependent on the vicissitudes of a mainframe and arc freed from the 
sometimes complicated tape manipulation and ether mainframe requirements. Although the bulk 
of our analyses have been done on a mainframe, we have used this CD-ROM technology with 
SPSS software (Windows version: a DOS version is also available) and have found it useftil m 
doing some of the secondary' analyses. With some minor programming changes to the SPSS 
syntax files generated by the CD-ROM software, users can make simple taL.ilations within 
minutes With a little practice, more complex analyses (say. within-group regressions, using 
weights) are also possible. 

The costs of CD-ROM technology, how'cver, can be considerable. To use the CD-ROM 
data set, you will need a high-end computer with an adequate harddrivc and a CD-ROM player, 
as well as SPSS or SAS software. Conservatively rhis could cost a few' thousand dollars, which 
may make adopting CD-ROM technology out of reach for many individuals or their institutions - 
- particularly academic institutions, which tend to lag behind in updating faculty computer 
technology. A flirther consideration is that statistical analyses tend to be paper-intensive and in 
this respect arc perhaps more suitable to mainframe resources. 

N ELS: 88 Research OpDortunities: Transition Data and the Gender Gap 

Although researchers have called for gcneralizcable studies of the gender gap (e g , 
Berryman. 1983; Oakes, 1990), until NELS.88. data were not available to study male-feiiale 
differences diinng the transition from junior to senior high school for a representative national 
sample We used the first ^vo waves of the projec: ( 1988, 1990; population of about I 7,000) to 
examine the gender gap in math Because NELS 88 is comprehensive, we were able to measure 
the gender gap in a number of ways Performance, as represented by the cogniiwe subject lest 
scores, perception, as measured b\ a generalized self concept scale provided in tlie data Hies and 
a specific mathematics attitude scale constructed from several separate variables, and 



participation, as indicated by the number and kinds of math classes taken Moreover, because 
NELS:88 (like other large data sets) includes critical demographic data, we could examine the 
gender gap as it manifests in different race/sex/SES groups. 

Considering differences for the male and female populations in general. ou\ anaKscs 
suggested that the gender gap already exists in eighth grade. Boys score sIightK higher, though 
with more variabilit>, than girls on standardized math tests in both eighth grade and sophomore 
year and girls' scores on the self-confidencc scale were noticeably lower than bo\s m both 
survevs Though girls took more (or more advanced) math courses than boys m eighth grade, the 
did not continue that practice when they got to high school. This is similar to uhat others have 
found. (See Kline & Ortman, 1^4). 

Table 1, which shows distributions of math test scores, is representative of some of our 
fiirther explorations of the gender gap and illustrates the importance of ethnic-sensitive analyses 
(Catsambis, 1994). The gender gap in math performance is not universal but rather varies from 
group to group: There arc no sex differences among Asians and African ^Amencans in either the 
Base Year or the First Follow-Up; Latinos and whites, however, evidence a gender gap in math 
performance as early as eighth grade. It is also worth noting that racial patterns arc the same for 
both se.xes; Asians score highest, with white students slightly behind: Latinos and Afncan 
Americans score about 10 points lower on average than Asians. Indeed, in relation to 
performance, racial differences seem more pronounced than gender differences as the male- 
female difference in test scores is never as great as the Asian-African Amencan difference 

Table 1 about here> 

B\ contrast, the gender gap in perception, as measured by the \CL.S .elf-coneept scale 
IS much more dramatic, universal, and scx-specific The mean for all girls, regaidless of race and 



SES, is lower than that of their male counterparts in both the Base Year and First Follow-Up 
surveys. Interestingly, the scores for the African-American girls were not as low as those for the 
three other race/ethnic groups. 

These preliminary' findings evidence some of the ways in which the NELS data can bo 
used. We sec here that the gender gap in math performance already exists by eighth grade for 
some students but has not yet emerged for others, while the gender gap in self-concept is already 
present m middle school and affects all females. The challenge now is to continue refining the 
model of the gender gap in performance, perception, and participation, and assessing hou male- 
female differences arc mediated by race, SES, ability, and school effects (for more complete 
results, see iVlorsc-Kclly, 1 995). Others have found a gender gap in self-esteem, (sec American 
Association of University Women, 1992: Fennema, 1974). 

The NELS:88 Data: Some Precautions 

Every large data set is to some extent a “work in progress," in that the successes and 
omissions of its predecessors and successors play some role in its conception and execution. 
Inevitably, then, as is the case with NELS:88, there arc aspects of the project that limit its 
useftilness or that must be taken into account in analyzing the data. Some problems arc not 
specific to NELS:88, but rather a universal consideration of working with large data sets Tims, 
though we use the NELS:88 data to illustrate the pitfalls and challenges of missing values and 
inconsistent responses below, these arc topics researchers must grapple w ith using an> of the 
NCES databases Other problems apply only to NELSH>X - omissions, for example Although 
there appears to be more than enough information m the NELS 88 files, m realitx there is not 
Far more serious than the omissions, however, is the error in the sur\c\ desH/n that \ve detail 
below the definition of "urban " 
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L arge Data Sets; The Recoding Challenge of Missing Values 



Working with the NELS.88 data for the past two years, wo have discovered the tmtli of 
the old aphorism that preparation is more than half of the job. Some of the most critical and 
consequential decisions necessitated by large data sets arc those pertaining to recoding and 
reclassin ing missing data. For example, several researchers have devoted considerable attention 
to explaining the effect of ability groupings on educational outcomes (c g. Slavin, Madden, 
Karwcit, Ijvcrmon, Dolan, I^^O; Hallinan & S<t>rcnson, 19X7) Unfortunately, in NFJ.S XX, 
22% of the responses to the ability groups variable arc unusable, either because they are missing 
or because the student responded "don't know" or "classes not grouped by ability.*' 

There arc several strategics for dealing with missing answers. First, because NFJ.S:XX 
contains different files for several populations, the best method for redistributing missing data is 
to cross-reference student answers with those provided by teachers and administrators. Thus, 
schools that do not employ ability-grouped instmetion should "match" the students making such a 
claim. Similarly, teachers* indication of the students' ability group can be used to fill in missing 
answers (and to check for consistency). This method, however, will almost certainly not eliminate 
all the missing values, which makes choosing another strategy almost unavoidable 

By far the easiest missing data strategy is to simply ignore missing cases Thus, in the 
case of ability groups, it could be assumed that the 22% missing are not classified into ability 
groups An equally quick-and-easy strategy would be to reassign all missing values to some 
measure of central tendency of the sample population. Missing cases could be assigned to the 
mode for nominal, and the mean for interval, variables In our example, the modal category is the 
Middle Ability Group Interestingly, since researchers fend to be mferesfed in the High and I.ow 
abdity groups, the decision to recode missing cases this uav may have the same practical ctfccl 
as simply eliminating them Father strategy . however, is potentially a problem because the 
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"missing" group is tvpically net a random phenomenon. To illustrate: Far From being a 
"representative subset" of the larger sample population, the group of students missing an ability 
group value contains more boys than girls, and a larger proportion of Afncan American, Latino, 
and low'SES students than is in the general population (i o . while 32‘^> of the enure sample is 
low SES, 44% of the missing group is low SES). Thus, by dealing with missing cases loo 
swccpingly, vve remove from analysis precisely those students most critical to much educational 
research 

There are, moreover, serious issues involved with homogenizing missing responses, either 
by discarding or reassigning them. This problem stems, first of all. from the fact that there are 
different kinds of missing answers and they are not all equivalent. Some are legitimate (e.g., the 
school does not use ability -grouped instaiction) Others result because the student refiiscd to 
answer the question. These arc very different phenomena, and provide different kinds of 
information: One tells us about the school, and the other about the student. Similarly, if it is our 
theoretical orientation that the school has a vested interest m keeping certain kinds of information 
from students, then we ought not to treat "don’t know" answers as if they were no different than 
"school does not have" answers. 

Moreover, as the standardized test scores illustrated, students in the NELS 88 sample arc 
not homogeneous: there arc wide dispersions among the eight son/ race groups. Tlic integrity of 
any analysis would surely be compromised were students to be treated as though the group mean 
or mode (which generally reflects the mean for whites) accurately represented all the different 
groups in the sample The most theoretical and methodologically sound way of dealing with 
missing data, therefore, is bv the hot deck procedure That is, by assigning missing values the 
same value as that of cases with similar characteristics Thus, for abilitv groups, it might make 
sense to reassign missing values to the mode (mean) of students with the same race se\ ST'S to 
capture individual characteristics We might also include in this set of charactensties the (vpe of 
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school (public or private) and the size of the eighth-grade class, since the presence (or absence) of 
abilitx' grouped instmetion can be the result of school characteristics. 

I nternal Inconsistency: Tlic Case of NELS SS 

A related problem to that of missing values is the number of inconsistent and unexpected 
answers (See Morse-Kelly. Stull. & Rigsby. 1995). Again, this is not necessanl> a problem 
specific to NELS XX. but rather "comes uitli the territorv" of working with large data sets 
Indeed, given that we are dealing with adolescents who have been assured of their anon>mu\, ue 
migh.t v\ell wonder why more students do not engage in "creative" siirvev responses. 

It is possible to check for internal validitx' since the same information is elicited in several 
different placcs/qucstions in the same survey as well as in different surv'eys For example, in the 
First Follow-Up survey, there arc several questions about math that contain the response "not 
taking subject": hours spent on math homework (in and out of school), attitudes toward math 
classes, and math grades. In addition, there is a "lead" question that allows students to check off 
that they "have not yet taken math" and thus skip all the questions that follow . There is 
surprisingly little consistency from one question to another — the number of students "not taking 
math" varies with each question, and far more students claim not to take math in relation to doing 
homework (almost 1,000, combined in and out of school) than in relation to giving their grades 
(only 122) Examples of survey-to-survey inconsistency include students who gained (or lost) 12 
or more family members between the Base Year and the First Follow* Up and students who were 
taking advanced math m Xth grade but remedial math in 1 0th grade 

Unfortunately these data problems are not confined to the student responses The School 
files, for example, also include some interesting/problematic findings When asked. "How man\ 
da\s in a row can a student be absent without an excuse before he or she is ecvisidered a truant." 
39'*', > of the administrators responded "0 da\s " Is this a case of "mental tmanev '" SimilarK. 




29% of the administrators consider a student a dropout after the student has missed 0 da\s of 
school. Schools report offering Advanced Placement courses in Business Math, General Math for 
grades lU-12. ninth grade General Science, and so on. The fact that even administrators can give 
outlandish answers is an important cautionarv note for procedures such as cross-checking 
answers and/or assigning higher credibilit> to answers given by adults 

Another kind of inconsistency results from "additive” questions that is. a series of 
questions asking, for example, how man> hours per week students spend on math homework, 
social studies homework, English homework, and so on or which ask students if tlie\ live with 
their mother, father, grandmother, sister, and so on. If you were to sum the senes responses. \ou 
would find that some students spend 160 hours per week on homework, or live with 42 family 
members, or take 22 math classes, or arc members of everv' club.. 

We arc only now beginning to make use of inconsistent responses in large data sets as a 
way to understand other aspects of schooling processes and outcomes (see Morse-KclK . Stull. & 
Rigsby, 1995). While the number of participants who give inconsistent or unexpected responses 
for an\ one question is generally small, they, like those whose answers arc missing, arc not 
representative of the general population. Students who claim to do 160 hours of homework a 
week, or to take every' math course, are more likely to be among the lowest SES students with 
bclow-avcragc performance on conventional school measures of ability (e g., standardized tests) 
This has implications for considering the correlation between, say, math ability on standardized 
tests and the number of hours spent on homework. To give another example, the sex distribution 
m the NELS XH sample population is 50-50 (M F), but among students claiming to have gone 
from advanced to remedial math, the sex distribution is 57-43 In other words, inconsistonev here 
IS more likelv to be a male trait This could have serious implications for studving. to draw on our 
previous studv area, the gender gap as it relates to course-taking behaviors 
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At the verv' least, the potential for unexpected responses in NELS 88 requires researchers 
to spend some preparation time familiarizing themselves with the variables and plotting an 
appropriate strateg\' for their research project. Outlandish responses may need to be recoded, 
perhaps via the hot deck method, in the same way missing answers arc rc-catcgorized In some 
cases. It will be possible to cross-rcfcrcncc student answers, with parent, teacher, or transcript 
data, to ensure more aecuraev, although vou should defend the assumption implicit m this method 
that one group (parcnts/teachers) is inherently more "right" (accurate) than another (students) 
Finallv, variables eonstaicted from additive responses (total hours homework, family size, 
number of math courses) may need to be truncated before they can be used for a regression am. 
Given the inverse relationship between claiming to spend 160 hours on homework and GPA, for 
e.xample, the high number (160) may mask the actual relationship. 

♦ NELS:88: Omissions and Shortages 

While we can appreciate the need to simplify* the data-gathering instalments in a project 
of this magnitude, serious omissions can result. One of the most noticeable shortcomings of the 
NELS:88 data set is what we call a "lack of context." No school district information is given for 
any of the schools in the study. It would be useftil to know, for example, how many other high 
schools there arc in the district, schools to which the student could transfer. Students may be 
more accepting of their school if it is the only viable option, or they may feel less trapped if they 
have other choices The same is taie of teachers. One of the variables in the School file is the 
number of teachers who left in the previous year. Leaving is not only a function of 
dissatisfaction with the original school, but also of the possibility of transferring or being 
transferred to another Schools with some unsatisfied teachers may not have any or few who left 
simply because there was no school to which ^o transfer Similarly, m the School surveN 
administrators were asked about relationships between the school board and the amuniinit\ In a 
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vcn' large school district there most probably would be little or no direct relationships, but in a 
one-school town the reverse would be true. The percentage of the students in the school receiving 
free lunches is included in the restricted data File, but not in the public use file Imprecise as it is. 
this IS the only measure of the social class of the students as a whole attending the school Nor is 
there any information about school finances provided other than a question or two about leaehers' 
salaries, which are only ,reported in very broad categories, so it is difficult to include an> anaKses 
of priorities A final c.xampic of lack of context is the disembodied opinions of teachers and 
administrators regarding why students drop out of school. Since we cannot tell from the NELS XX 
data how nuieh contact these authoritv figures may have had w ith problem students, there is no 
way for us to evaluate these opinions. 

In general, more data are collected about attitudes than behaviors Moreover, even 
behavioral questions are phrased subjectively. Students and teachers are asked man>‘ questions 
about their habits and practices, that is, how many times students talk to their parents about high 
school courses, how much time teachers spend grading papers, and so on. Tlic imprecision of 
many responses ("several times," "a few times"), however, tends to make these ansv\crs less 
precise and less hclpfiil than they might otherwise be. 

Even demographically, NELS:88 contains some omissions. There is a surprising lack of 
information about the teachers as a group, for example. While the age, experience .and sex is 
given for those teachers selected to complete a survey, no basis is given for determining hov\ 
representative of the school as a whole this subset of teachers is Tins omission prevents the kinds 
of analyses that might follow our exploration of the gender gap For example, to what extent is 
the sex of the teacher correlated with the allocation of rewards (e g high abilit\ group placement) 
in middle and high school’^ We have no variables that allow us to assess the work and mlellectual 
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.\nd, just ns there is potentinily valuable data missing in relation to tcaelicrs. the same is 
tnic (surprisingly) of students. There are some large gaps in infonnation about peers' for 
example. Docs the student have a boy/girlfricnd (and what does that mean in tenns of beha\ lors); 
How often does the student date’.^ How many friends does the student have‘^ Are tlie\ from m- 
school or out-of-school'^ Are the in school friends the same t\pes as the out of school friends'^ To 
what extent are the student's friends similar to peers in the neighborhood and/or the school'^ Flow 
are conflicting pressures between in-school and out-of-school friends resoKed*^ Given the fact 
that negative peer pressure is an important determinant of academic achie\ement/aspirativ^ns (See 
Rigsby, Stull, & Morse-Kelly, 1^94). more explicit infonnation should have been collected ibout 
the nature of these relationships. Aside from peers, the students' work experience is onl\ vaguely 
detailed in NELS 88 While HS&B could be criticized for including too much work experience 
material, NELS:88 almost represents the other extreme. Questions about the student's current job 
are included, but not about the student's job history as a whole. And, while there are numerous 
questions in the second Follow-Up School file about school-business relationships and job 
training programs, there is little actual work experience infonnation. To evaluate the 
effectiveness of school programs, better wage infonnation is necessary'. It would also help to 
have some estimate of the prevailing wage in the community since this varies considerably from 
region to region. 

Finally, there are omissions of school data. The possibilitv' of schooling effects has a long 
history' in the literature, and education scholars believe differences in schools and in opportunities 
within schools affect schooling performances NELS 88. however, lacks enough mformation to 
allow researchers to adequately address this issue While test information is given for the 
individual students included in the sample, no information is given for the school as a whole 
Students m the sample were randomlv selected, not because thev were representative of the 
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school they attended. School SAT/ACT averages arc given, but cannot be mcluded in any 
analyses because the percentage of schools reporting these results is so low. 

Many of these problems could be resolved or reduced if linkages with other NCES data 
sets could be established. Since different schools arc included in the different data sets, 
composite variables weald have to be constructed for similar types of schools. This would grcatlv 
enhance the depth and range of research activities 

NELS 8S Desiun Flaws: The Definition of "Urban** 

More serious than the lack of school district information is the problem we encountered 
while considering the educational e.xpcrience of "at-risk," i.c., inner-city, students. In an earlier 
paper (Rigsby, Stull, ^orsc-Kelly, 1994), we used the Urban/Suburban/Rural variable 
provided in the public use files to investigate the relationship between educational achievement 
and aspirations. Unexpectedly, students attending schools classified as "urban" arc the same as or 
better off than their counterparts attending schools classified as "suburban." For example, on the 
average, urban students scored the same as suburban students (10th grade mathematics test) or 
higher ( lOth grade math/reading test). Also, urban students were higher on both the Locus of 
Control and Self Concept scales. These results raise the issue of whether the urban category in 
the public use data files is too broad to be used in any investigation of large city education 

To investigate the issue of urbanicity ftirther, wc collected information from the 1980 and 
1990 censuses for cities with populations over 600,000 in 1980. From the census, wc collected 
the number of boys and girls aged 5/6 in 1980, who thus would be in the NELS 88 population 
We also collected the number of boys and girls aged 13/14 in 1 980 and aged 13/14 m |90()to 
determine how representative the NELS;88 urban group is of the largest cities in the L’mted 
States These figures were further broken down b\ race, children living in povertv. and single- 
parent households When these figures are compared to the relevant figures in the NELS 88 data 
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set, it is obvious that there is a problem and it is a serious one. African-American students 
attending schools in the verv' largest cities arc vastly underrepresented in the NELS H8 data set 
In the North Central region, for example, according to census data, 26.74% of the 13/14 \ear-old 
population is .African American. In the NELS:88 data set. however, only 4,5% of this same 
regional group is African American White students, on the other hand, were o\crrcprcsentcd 
(Sec Table 2). TTicse same discrepancies between census and NELS:88 data also appear when 
we compare "percent in poverty" and "single-parent households" (see Stull. Rigsb>, & Morse- 
Kelly. I^^)5a). 



<Table 2 about hcre> 

This problem results from the fact that NCES used the census concept of "central cit\ " 
to define urban. The largest cit>' in a Metropolitan Statistical Area (IVISA) is a central cit\ . Other 
"cities" may be included if the\ meet density/employment criteria. As a result there is 
considerable variation in the populations of these central cities, ranging from New York City 
(7.164,742) to Benton Harbor, Michigan (14,246). Almost 70% of these cities had populations 
of less than 100.000. accounting for 24.04% of the total number of people defined as living in a 
"central city ," In addition, some older declining cities may be excluded because of the 
employment criteria. For example, the Philadelphia PMSA includes three central cities 
Philadelphia (1, 646,713). Norristown (34. 387) and Camden. New Jersey (82. 537) Chester 
(40.834). a decaying city, was not included as a central city and therefore was categorized as 
suburban in NELS 88 

In other words, the NELS 88 data as the\ are now constmeted cannot be used to stud\ 
the education problems in \er> large cities, which are the cities most often cited in diseussu'ms o\ 
issues of povcrt> and problems of urban education (see Kantor &. Brenzel. 1002. Reed Sautter. 
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1990). Essentially, a researcher using the urbanicity variable provided in the public use data set 
would not be able to capture the depth of the social problems existing in the very large cities, or 
to be sure of a sample representative of critical urban demographic characteristics (Arnold. 

1995) One wonders if the "improvements" in the urban condition beginning to appear in the 
literature (e g.. recent claims from the Council of the Great Citv Schools in Washington. DC) 
have more to do with the "redefinition" of urban than anything else. In the short run. adding one 
variable -city size broadly defined — to both the public use and restricted files would go far to 
alleviate the problem. 

This, incidently. is not a problem just with NELS:88, but rather applies to all of the 
recent NCES surveys as each used the Census definition of "urban." Care must be taken in 
making any generalizations using the urban/suburban/ rural designation included in the data set. 

In addition, very large cit\’ populations arc underrepresented, 

Conclusion 

Although initially intimidating, these NCES large data set projects are well worth 
investigating. Nowhere else is there such a wealth of data available to educational researchers on 
such a routine basis. The scope of possible analyses is limited only by the user's imagination 
We heartily encourage even the most fainthearted to become involved. 
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Table I 

The Gender Gap in Test Scores by Racc/Sex Groups 



TEST 


African .American 


Asian 




Latino 




White 




S('()Rl:S 


Male 


Female 


Male 


Female 


Male 


Female 


Male 


Female 




n^768 


n=846 


n=420 


n=5()2 


n=96 1 


n=981 


n=5.8.'vO 


n=5.914 


BY: 


44.95 


44.86 


56.05 


56.25 


47.32 


46,04 


5.v4l 


53.05 


Stondardi/cd 




(8.6) 


(10.4) 


(9.12) 


(vS.S) 


(S 5) 


(9 8) 


(9.5) 


Math Test 
Score Moan 
(SD) 


l=.22* 




1 = -.33 




i = 3.27 




1= 2.04 




FI: 


45.26 


45.32 


55.99 


56.20 


47.60 


46.60 


53.06 


52.83 


Standardized 


(8.7) 


(9.0) 


(9.9) 


(8.6) 


(9.1) 


(8.5) 


(9.9) 


(9. 1 ) 


Math Test 
Score 

Mean (SD) 


t= -.13 




t = -.36 




t = 2.49 




1 = 1.29 





Population: Students participating in both Base Y'ear (BY') and First Follow-l.-p ( FI ) surveys, exdudmi' Dropv>uls and Native Americans 
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Table 2 

Gender/Racc Distributions by Region 



Region 


1980 Census 
5/6 yr olds 


1990 Census 
13/14 yr olds 


Base Ycar^ 
13/14 yr olds 


First Follow- L'p^ 
(GSurban) (GH)urban) 


Northeast 


Af Amcr 


17.16 


18.81 


17 20 


5.40 


4 95 


White 


22.73 


24 28 


14.05 


37 60 


38 55 


North Central 


Af Amer 


26 00 


26.70 


26.70 


4.05 


3 65 


White 


10 07 


15 66 


15.68 


42.30 


42 “^5 


South 


Af .Amc." 


21 12 


23 19 


20 54 


ll 10 


1 1.55 


While 


23.15 


22.49 


10.68 


.32.05 


32.15 


West 


Af Amer 


7.96 


9.18 


3.34 


2.70 


2.05 


White 


27.88 


29.17 


15.0 


29 45 


20.85 



• Percentages based onI> on students uho were idetildied ;ts "urban" on the GXurban (GlOurb.ui) given in Uic public use tile. 
Census figures arc based on data from Uio 17 cilics with a population of 600.00U or more in 1980. 
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